Fitted Q-iteration in continuous action-space MDPs

نویسندگان

András Antos

Rémi Munos

Csaba Szepesvári

چکیده

We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by some policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action values. We provide a rigorous analysis of this algorithm, proving what we believe is the first finite-time bound for value-function based algorithms for continuous state and action problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs

In this paper we address reinforcement learning problems with continuous state-action spaces. We propose a new algorithm, tted natural actor-critic (FNAC), that extends the work in [1] to allow for general function approximation and data reuse. We combine the natural actor-critic architecture [1] with a variant of tted value iteration using importance sampling. The method thus obtained combines...

متن کامل

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Sampling Based Fitted Value Iteration Finite-Time Bounds for Sampling-Based Fitted Value Iteration

In this paper we develop a theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decision problems (MDPs) under the assumption that a generative model of the environment is available. Our main results come in the form of finite-time bounds on the performance of two versions of sampling-based FVI. The con...

متن کامل

Evaluation of Batch-Mode Reinforcement Learning Methods for Solving DEC-MDPs with Changing Action Sets

DEC-MDPs with changing action sets and partially ordered transition dependencies have recently been suggested as a sub-class of general DEC-MDPs that features provably lower complexity. In this paper, we investigate the usability of a coordinated batch-mode reinforcement learning algorithm for this class of distributed problems. Our agents acquire their local policies independent of the other a...

متن کامل

Reinforcement Learning with a Bilinear Q Function

Many reinforcement learning methods are based on a function Q(s, a) whose value is the discounted total reward expected after performing the action a in the state s. This paper explores the implications of representing the Q function as Q(s, a) = sWa, where W is a matrix that is learned. In this representation, both s and a are real-valued vectors that may have high dimension. We show that acti...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Fitted Q-iteration in continuous action-space MDPs

نویسندگان

چکیده

منابع مشابه

Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs

Accelerated decomposition techniques for large discounted Markov decision processes

Sampling Based Fitted Value Iteration Finite-Time Bounds for Sampling-Based Fitted Value Iteration

Evaluation of Batch-Mode Reinforcement Learning Methods for Solving DEC-MDPs with Changing Action Sets

Reinforcement Learning with a Bilinear Q Function

عنوان ژورنال:

اشتراک گذاری